26 research outputs found

    Rapidash: Efficient Constraint Discovery via Rapid Verification

    Full text link
    Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there has been considerable research interest in achieving fast verification and discovery of exact DCs within the database community. Despite the significant advancements in the field, prior work exhibits notable limitations when confronted with large-scale datasets. The current state-of-the-art exact DC verification algorithm demonstrates a quadratic (worst-case) time complexity relative to the dataset's number of rows. In the context of DC discovery, existing methodologies rely on a two-step algorithm that commences with an expensive data structure-building phase, often requiring hours to complete even for datasets containing only a few million rows. Consequently, users are left without any insights into the DCs that hold on their dataset until this lengthy building phase concludes. In this paper, we introduce Rapidash, a comprehensive framework for DC verification and discovery. Our work makes a dual contribution. First, we establish a connection between orthogonal range search and DC verification. We introduce a novel exact DC verification algorithm that demonstrates near-linear time complexity, representing a theoretical improvement over prior work. Second, we propose an anytime DC discovery algorithm that leverages our novel verification algorithm to gradually provide DCs to users, eliminating the need for the time-intensive building phase observed in prior work. To validate the effectiveness of our algorithms, we conduct extensive evaluations on four large-scale production datasets. Our results reveal that our DC verification algorithm achieves up to 40 times faster performance compared to state-of-the-art approaches.Comment: comments and suggestions are welcome

    Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities

    Full text link
    Large Language Models (LLMs) are being increasingly employed in data science for tasks like data preprocessing and analytics. However, data scientists encounter substantial obstacles when conversing with LLM-powered chatbots and acting on their suggestions and answers. We conducted a mixed-methods study, including contextual observations, semi-structured interviews (n=14), and a survey (n=114), to identify these challenges. Our findings highlight key issues faced by data scientists, including contextual data retrieval, formulating prompts for complex tasks, adapting generated code to local environments, and refining prompts iteratively. Based on these insights, we propose actionable design recommendations, such as data brushing to support context selection, and inquisitive feedback loops to improve communications with AI-based assistants in data-science tools.Comment: 24 pages, 8 figure

    DataPrism: Exposing Disconnect between Data and Systems

    Get PDF
    peer reviewedAs data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of data. E.g., a health-monitoring system that is designed under the assumption that weight is reported in lbs will malfunction when encountering weight reported in kilograms. Like software debugging, which aims to find bugs in the source code or runtime conditions, our goal is to debug data to identify potential sources of disconnect between the assumptions about some data and systems that operate on that data. We propose DataPrism, a framework to identify data properties (profiles) that are the root causes of performance degradation or failure of a data-driven system. Such identification is necessary to repair data and resolve the disconnect between data and systems. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataPrism alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataPrism reports causally verified root causes-in terms of data profiles-of the system malfunction. We empirically evaluate DataPrism on seven real-world and several synthetic data-driven systems that fail on certain datasets due to a diverse set of reasons. In all cases, DataPrism identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques

    Increasing frailty is associated with higher prevalence and reduced recognition of delirium in older hospitalised inpatients: results of a multi-centre study

    Get PDF
    Purpose: Delirium is a neuropsychiatric disorder delineated by an acute change in cognition, attention, and consciousness. It is common, particularly in older adults, but poorly recognised. Frailty is the accumulation of deficits conferring an increased risk of adverse outcomes. We set out to determine how severity of frailty, as measured using the CFS, affected delirium rates, and recognition in hospitalised older people in the United Kingdom. Methods: Adults over 65 years were included in an observational multi-centre audit across UK hospitals, two prospective rounds, and one retrospective note review. Clinical Frailty Scale (CFS), delirium status, and 30-day outcomes were recorded. Results: The overall prevalence of delirium was 16.3% (483). Patients with delirium were more frail than patients without delirium (median CFS 6 vs 4). The risk of delirium was greater with increasing frailty [OR 2.9 (1.8–4.6) in CFS 4 vs 1–3; OR 12.4 (6.2–24.5) in CFS 8 vs 1–3]. Higher CFS was associated with reduced recognition of delirium (OR of 0.7 (0.3–1.9) in CFS 4 compared to 0.2 (0.1–0.7) in CFS 8). These risks were both independent of age and dementia. Conclusion: We have demonstrated an incremental increase in risk of delirium with increasing frailty. This has important clinical implications, suggesting that frailty may provide a more nuanced measure of vulnerability to delirium and poor outcomes. However, the most frail patients are least likely to have their delirium diagnosed and there is a significant lack of research into the underlying pathophysiology of both of these common geriatric syndromes

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries
    corecore